# Multimodal Input
Gemma 3 12b Pt Qat Q4 0 Gguf
Gemma 3 is a lightweight open-source multimodal model from Google, supporting text and image input with text output, featuring a 128K ultra-long context window and support for 140+ languages.
Image-to-Text
G
google
475
12
3dtopia XL
Apache-2.0
3DTopia-XL is a diffusion Transformer architecture based on PrimX efficient 3D representation, capable of rapidly generating high-quality 3D assets
3D Vision
3
FrozenBurning
129
45
Sam2 Hiera Base Plus
Apache-2.0
SAM 2 is a foundational model for promptable visual segmentation in images and videos developed by FAIR, supporting efficient segmentation through prompts.
Image Segmentation
S
facebook
18.17k
6
Featured Recommended AI Models